Incident Report: Stuck Collector on Mantle
Date: 2024-01-12
Time: 13:43 (GMT+3)
Duration: 5 hours
Descriptionβ
A stuck collector was detected on Mantle. The current block (43399676) and the last queried block (43379008) were recorded, indicating a significant lag in data collection.
Root Causeβ
The issue was suspected to be related to an RPC issue. There were multiple alerts around the collectors being unable to get transactions for Mantle.
Impactβ
The data collection for Mantle was delayed, causing a lag in the current block and the last queried block.
Timelineβ
- 13:43 - Abdel first noticed the stuck collector issue.
- 13:54 - Aaron identified a potential RPC issue and related alerts.
- 13:57 - Bedirhan noted Mantle nodeβs syncing issues and recommended using a public RPC URL with Reblok.
- 18:19 - A fix by Vekil was deployed, allowing collectors to fall back to the public RPC on a per-query basis.
Lessons Learnedβ
The incident highlighted the need for flexible data collection methods and the importance of having fallback mechanisms in place for RPC issues.
Actions Takenβ
A fix was produced and deployed. All of the production collectors now fall back to the public RPC on a per query basis. The functionality to support multiple providers in the data collectors was added.
Related Images/Logsβ
Escalation link.
Incident Reviewer(s)β
Abdel, Bedirhan, Aaron